Skip to main content

Latency vs Throughput

๐Ÿ“Œ Latencyโ€‹

Definition: The time taken to process a single request or operation, from initiation to completion.

  • Unit: Measured in milliseconds (ms) or seconds
  • Analogy: The time it takes a single car to travel from point A to point B

Examples in System Designโ€‹

  • Time taken for an API call to return a response
  • Time for a database to fetch one record

๐Ÿ‘‰ Low latency means fast response times. It's critical for user-facing systems like search engines, online games, trading systems, etc.


๐Ÿ“Œ Throughputโ€‹

Definition: The number of requests a system can handle per unit of time.

  • Unit: Measured in requests per second (RPS), queries per second (QPS), or transactions per second (TPS)
  • Analogy: The number of cars passing through a highway per minute

Examples in System Designโ€‹

  • A web server handling 10,000 requests/sec
  • Kafka processing 1 million messages/sec

๐Ÿ‘‰ High throughput means the system can handle more load.


โš–๏ธ Relationship Between Latency and Throughputโ€‹

They are related but not the same:

Different Combinationsโ€‹

  • Low latency but low throughput: A system that responds quickly but can't handle many users at once
  • High throughput but high latency: A batch processing system that processes thousands of records at once but takes minutes per request

Trade-offsโ€‹

Improving one often comes at the cost of the other:

  • Adding parallelism can improve throughput but may increase latency due to coordination overhead
  • Optimizing for low latency (e.g., caching, indexing) may reduce maximum throughput

๐Ÿ“Š Real-World Example: Ride-sharing App (Uber, Ola)โ€‹

MetricDescription
LatencyTime taken to show nearby drivers after you open the app
ThroughputNumber of ride requests the system can handle globally per second

๐Ÿ›  Optimization Strategiesโ€‹

To Reduce Latencyโ€‹

  • Use caching (Redis, CDN)
  • Optimize database queries (indexes, denormalization)
  • Reduce network hops

To Increase Throughputโ€‹

  • Horizontal scaling (more servers)
  • Message queues (Kafka, RabbitMQ)
  • Batch processing

๐Ÿ’ก Key Takeawaysโ€‹

Latency = speed of one request

Throughput = how many requests per unit time

Understanding both metrics is crucial for designing scalable systems that meet user expectations and business requirements.